无人驾驶飞机(UAV)跟踪对于诸如交货和农业等广泛应用具有重要意义。该领域的先前基准分析主要集中在小规模的跟踪问题上,同时忽略了数据模式的类型,目标类别和方案的多样性以及所涉及的评估协议的数量,从而极大地隐藏了深度无人机跟踪的巨大功能。在这项工作中,我们提出了迄今为止最大的公共无人机跟踪基准Webuav-3M,以促进深度无人机跟踪器的开发和评估。 Webuav-3M在4,500个视频中包含超过330万帧,并提供223个高度多样化的目标类别。每个视频都通过有效且可扩展的半自动目标注释(SATA)管道密集注释。重要的是,要利用语言和音频的互补优势,我们通过提供自然语言规格和音频描述来丰富Webuav-3M。我们认为,这种增加将大大促进未来的研究,以探索语言功能和音频提示,用于多模式无人机跟踪。此外,构建了scenario约束(UTUSC)评估协议和七个具有挑战性的场景子测验集,以使社区能够开发,适应和评估各种类型的高级跟踪器。我们提供了43个代表性跟踪器的广泛评估和详细分析,并设想了深度无人机跟踪及其他领域的未来研究方向。数据集,工具包和基线结果可在\ url {https://github.com/983632847/webuav-3m}中获得。
translated by 谷歌翻译
深度神经网络(DNN)已广泛采用健康风险预测,以提供医疗保健诊断和治疗。为了评估其稳健性,现有研究在型号参数可访问的白色/灰度箱设置中进行对抗性攻击。然而,即使大多数现实世界的型号训练私有数据并在云上作为黑匣子服务发布,也是更现实的黑盒对抗性攻击。为了填补这一差距,我们提出了针对Medattacker的健康风险预测模型的第一个黑匣子对抗攻击方法来调查他们的脆弱性。 MedAttacker通过两个步骤解决了EHR数据所带来的挑战:层次定位选择,它选择强化学习(RL)框架中的攻击位置并替换替代替代基于分数的原则。特别是,通过考虑EHR中的时间上下文,它通过使用每次访问的贡献分数和每个代码的显着分数来初始化其RL位置选择策略,这可以与决定性变化决定的确定性替代选择过程很好地集成。在实验中,Medattacker始终如一地实现了最高的平均成功率,并且在某些情况下攻击了在多次真实数据集中的黑匣子环境中的三个高级健康风险预测模型时,最近的白盒EHR对抗攻击技术甚至优于最近的白盒EHR对抗性攻击技术。此外,基于实验结果,我们包括讨论捍卫EHR对抗性攻击。
translated by 谷歌翻译
人重新识别(RE-ID)在公共安全和视频监控等应用中起着重要作用。最近,从合成数据引擎的普及中获益的合成数据学习,从公众眼中引起了极大的关注。但是,现有数据集数量,多样性和变性有限,并且不能有效地用于重新ID问题。为了解决这一挑战,我们手动构造一个名为FineGPR的大型人数据集,具有细粒度的属性注释。此外,旨在充分利用FineGPR的潜力,并推广从数百万综合数据的高效培训,我们提出了一个名为AOST的属性分析流水线,它动态地学习了真实域中的属性分布,然后消除了合成和现实世界之间的差距因此,自由地部署到新场景。在基准上进行的实验表明,FineGPR具有AOST胜过(或与)现有的实际和合成数据集,这表明其对重新ID任务的可行性,并证明了众所周知的较少的原则。我们的Synthetic FineGPR数据集可公开可用于\ URL {https://github.com/jeremyxsc/finegpr}。
translated by 谷歌翻译
交通模拟器是运输系统运营和计划中的重要组成部分。常规的交通模拟器通常采用校准的物理跟踪模型来描述车辆的行为及其与交通环境的相互作用。但是,没有普遍的物理模型可以准确地预测不同情况下车辆行为的模式。鉴于交通动态的非平稳性质,固定的物理模型在复杂的环境中往往不太有效。在本文中,我们将流量模拟作为一个反向加强学习问题,并提出一个参数共享对抗性逆增强学习模型,以进行动态射击模拟学习。我们提出的模型能够模仿现实世界中车辆的轨迹,同时恢复奖励功能,从而揭示了车辆的真实目标,这是不同动态的不变。关于合成和现实世界数据集的广泛实验表明,与最先进的方法相比,我们方法的出色性能及其对流量变化动态的鲁棒性。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.
translated by 谷歌翻译
Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/
translated by 谷歌翻译